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Abstract 



We study the Likelihood function of data given /nl for the so-called local type of non- 
Gaussianity. In this case the curvature perturbation is a non-linear function, local in real 
space, of a Gaussian random field. We compute the Cramer-Rao bound for /nl and show 
that for small values of /nl the 3-point function estimator saturates the bound and is 
^ I equivalent to calculating the full Likelihood of the data. However, for sufficiently large 

' /nL) the naive 3-point function estimator has a much larger variance than previously 

thought. In the limit in which the departure from Gaussianity is detected with high 
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^ . confidence, error bars on /nl only decrease as 1/lnNpix rather than Npj^ as the size 

^ I of the data set increases. We identify the physical origin of this behavior and explain 

why it only affects the local type of non-Gaussianity, where the contribution of the first 
multipoles is always relevant. We find a simple improvement to the 3-point function 

—1/2 

estimator that makes the square root of its variance decrease as Npj^ even for large /nl , 
asymptotically approaching the Cramer-Rao bound. We show that using the modified 
estimator is practically equivalent to computing the full Likelihood of /nl given the data. 
Thus other statistics of the data, such as the 4-point function and Minkowski functionals, 
contain no additional information on /nl- In particular, we explicitly show that the recent 
claims about the relevance of the 4-point function are not correct. By direct inspection of 
the Likelihood, we show that the data do not contain enough information for any statistic 
to be able to constrain higher order terms in the relation between the Gaussian field 
and the curvature perturbation, unless these are orders of magnitude larger than the size 
suggested by the current limits on /nl- As our main focus is the scaling with Npix of 
the various quantities, calculations are done in flat sky approximation and without the 
radiation transfer function. 



1 Introduction 



In single field slow-roll inflation the level of non-Gaussianity is sharply predicted and very 
small, less than 10~® [1)2]. This is quite far from the present experimental sensitivity and 
probably not attainable with either CMB observations or galaxy redshift surveys. As a 
result, deviations from a purely Gaussian statistics of density perturbations, if observed, 
could provide important constraints on models of early cosmology, forcing us to abandon 
the single-field slow-roll paradigm. 

Of course there are many ways in which a signal could be "non-Gaussian". Given a 
data set, such as the WMAP maps, there are two possible ways to proceed. One could 
calculate all kinds of statistics of the data and compare the results with the expectation 
for a Gaussian field searching for anomalies. This is a fine strategy as long as one adjusts 
the significance of the result to account for the number of possible deviations that have 
been explored. There are several anomalies in the WMAP data reported in the literature 
that have been found in this way (see for example [3-5]). Unfortunately their significance 
is hard to assess and as a result one is not sure how seriously to take them. 

The second approach is to think about the possible physical mechanisms that can 
lead to non-Gaussianities and search for their particular signatures. In the context of 
primordial effects one should investigate what types of non-Gaussianity can plausibly be 
produced in various inflationary models. This approach is further bolstered by the fact 
that at least at the level of the 3-point function, primordial signals seem to fall into two 
definite classes. Thus there are only two different signatures one has to look for. 

The analysis of inflationary models that go beyond the single field slow-roll class has 
identified several examples with a relatively high level of non-Gaussianity, within reach 
of present or forthcoming experiments. For nearly Gaussian fluctuations, the quantity 
most sensitive to departures from perfect Gaussianity is the 3-point correlation function. 
In general, each model will give a different correlation between the Newtonian potential 
mode^: 

($(ki)$(k2)$(k3)) = {27rf6^{ki + k2 + ks)F{h, k2, h) . (1) 

The function F describes the correlation as a function of the triangle shape in momentum 
space. 

The predictions for the function F in different models divide quite sharply into two 
qualitatively different classes as a consequence of qualitatively different ways of producing 
correlations among modes [8]. The first possibility is that the source of density perturba- 
tions is not the inflaton but a second light scalar field a. In this case non-Gaussianities 
are generated by the non-linear relation between the fluctuation 6a of this field and the 
final perturbation $ we observe. This non-linearity is local as it acts when the modes are 
much outside the horizon; schematically we have <5(x) = g(x) -|- /nl(5^(x) — (5^)) + • • •, 
where 5 is a Gaussian random field. The quadratic piece introduces a 3-point function for 

*Even with perfectly Gaussian primordial fluctuations, the observables, e.g. the temperature 
anisotropy, will not be perfectly Gaussian as a consequence of the non-linear relation between 
primordial perturbations and what we will eventually observe. These effects are usually of order 
10~^ (see for example [6,7]) and thus beyond (but not much) present sensitivity. In the following 
we will disregard these contributions. 
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$ of the form 

F[k,, h) = /nl • 2A| . (JL + T^a + As) ' (2) 

\>^llv2 '^I'^S '^2'^SJ 

where A$ is the power spectrum normalization, (<I>(ki)$(k2)) = (27r)'^5'^(ki + k2)A$-A;j"^, 
which has been taken as exactly scale invariant. Examples of this mechanism are the 
curvaton scenario [9] and the variable decay width model [10], which naturally give rise to 
/nl greater than 10 and 5, respectively. Various subtleties in estimating the size of this 
type of non-Gaussianity will be the focus of this paper. 

The second class of models are single field models with a non-minimal Lagrangian, 
where the correlation among modes is created by higher derivative operators [11-15]. 
In this case, the correlation is strong among modes with comparable wavelength and it 
decays when we take one of A:'s to zero keeping the other two fixed. Although different 
models of this class give a different function F, all these functions are qualitatively very 
similar. We will call this kind of functions equilateral because the signal is maximal for 
equilateral configurations in Fourier space, whereas for the local form ([2]) the most relevant 
configurations are the squeezed triangles with one side much smaller than the others. We 
will not discuss the equilateral type of non-Gaussianity in this paper too much. We will 
just point out that the effects studied in this paper do not apply in that case so that the 
situation is much simpler. 

The strongest constraint on /nl comes from analyzing the 3-point function of the 
WMAP data set. WMAP is the best available data set because it has the largest number 
of pixels measured with good signal-to-noise. From the first year data the constraint is [16]: 

- 27 < /nl < 121 at 95% C.L. (3) 

This constraint is better than that obtained by the WMAP collaboration both using the 
one year WMAP data [17] and the three year ones [18]. This is so because the WMAP 
team used a non-optimal estimator which did not adequately treat the effect of anisotropic 
noise, as already noted in [17,19]. In [16], we showed that the effect of the anisotropic 
noise can be substantially reduced with the addition of a linear piece to the estimator. 
Always in [16] we also constrained the level of the equilateral 3-point function. For both 
types, the departures from Gaussianity still allowed by the data are at the same level. 

Given the interest in constraining the level of non-Gaussianity, one may wonder if a 
statistic other than the 3-point function might extract more information about /nl- There 
are various contradictory, or at least apparently contradictory, answers to this question 
in the literature. On the one hand in [20] and [16] it is argued that the 3-point function 
saturates the Cramer- Rao bound up to terms of order /nl^^'^'^j where A is the square of 
the amplitude of curvature perturbations: A^l'^ ~ 10~^. On the other hand calculations of 
the signal to noise in the 4-point function by [21] and [22] point to a different conclusion. 
These papers claim that, even though in the limit of /nl the signal to noise ratio 
of the 4-point function is negligible, it grows more rapidly with the number of pixels in 
the data set than for the 3-point function. As a result for values of /nl rather small, say 
around ^ 50 for an experiment like Planck, the signal to noise in the 4-point function is 
larger than for the 3-point function and stronger constraints on /nl could be placed by 
studying the 4-point function. Of course this result is puzzling. One is immediately drawn 
to the question, what about the 5-point function? And why not the 11-point function? 
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Applying the same arguments as in [21] and [22] would show that the signal to noise ratio 
becomes larger the higher the n-point function considered. Clearly there is a contradiction. 

It is the aim of this paper to clarify this contradiction. We will show that both 
calculations have missed an interesting subtlety of the local type of non-Gaussianity in the 
case of scale invariant, or nearly scale invariant, spectrum of primordial perturbations. As 
a result, the calculation of the noise of various estimators (including the 3-point function) 
for finite /nl is missing some relevant term. Some of the terms that are naively down by 
powers of /nl^^^^ are actually much larger, being enhanced by Npix. The growth in the 
signal to noise for high /nl seen in the above papers is fictitious. We will show that the 
same subtlety creeps into the calculation of [20] and thus the 3-point function estimator 
considered there also does not saturate the Cramer-Rao bound for large /nl- We want to 
stress that even though what was missed was a rather subtle point, it has potentially large 
consequences on the signal to noise of the estimators previously considered. For example 
when one is in the regime of large signal to noise, the error bars on /nl from the 3-point 
function decrease as l/lnNpjx rather than N^j^^ . The reader at this point should not 

— 1/2 

panic, we will show that the Cramer-Rao bound in this regime still scales as Np-^' and 
that it is rather straightforward to extract all of this information from the data either by 
calculating the full Likelihood or slightly tweaking the 3-point function estimator. 

What is the missing subtlety? To understand it, it is best to recall what is the main 
effect of the local non-Gaussianity: it correlates large and small scales. In the 3-point 
function, a long wavelength mode modulates the amplitude of all the short wavelengths 
by the same amount, regardless of the wavelength of the short mode. Furthermore, in [8] we 
showed that most of the signal in the 3-point function is coming from squeezed triangular 
configuration in Fourier space. More importantly for this discussion, if one considers the 
signal to noise as a function of the wavenumber of the long wavelength mode ki, one 
gets an equal amoTint of information from every logarithmic interval in k^. This in fact 
is the source of the problem. The smallest fcx, in the survey are by definition the ones 
with the largest cosmic variance as there are the least of them in the survey. As one 
increases the resolution of the survey the contribution to the signal to noise from the long 
wavelengths only decreases logarithmically and thus the large cosmic variance of the long 
modes translates into large variances in the estimators of /nl- 

In fact in [16] indications of the importance of the long wavelength modes were em- 
phasizes in the context of the effect of anisotropic noise on the estimator of the 3-point 
function. The noise in the map is anisotropic because WMAP spent different amounts of 
time observing each pixel on the sky. As a result the level of small scale power, for large 
multipoles where the noise becomes important, varies across the sky. This map of small 
scale power can randomly align with the particular large scale mode giving a spurious 
/nl signal. Of course on average this effect is zero as there is no intrinsic correlation 
between the map of observing time per pixel and the large scale temperature. However 
for a particular realization, some modes will be correlated (spurious positive /nl) and 
others anti-correlated (spurious negative /nl)- The contribution to the signal from the 
long wavelength modes will not add exactly to zero, as we have few of them in the survey. 
The random left over spurious signal effectively increases the variance of the estimator. 
This effect was noted in the WMAP team analysis [17, 19], where the constraint on /nl 
got worse as they increased the size of the data set by including more of the small scales. 
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In [16] the estimator was improved by including a linear piece which substantially reduces 
the effect allowing us to get better constraints. This effect is the reason why the 3-year 
data analysis by the WMAP team [18] did not appreciably improve the limits. 

To clarify the situation we will study the full Likelihood of the data given /nl- We 
will keep careful track of enhanced terms and thus do a consistent expansion in /nl- We 
will then calculate the Cramer- Rao bound, extending the results of [20] to non-zero /nl- 
We will show how the additional terms in the variance of the 3-point function estimator 
make it become sub-optimal. This can be easily fixed using an improved estimator which 
asymptotically saturates the Cramer-Rao bound. The use of this estimator is equivalent 
to the full Likelihood of the data. 

The fact that the improved 3-point function estimator is equivalent to the full Likeli- 
hood of the data, implies that there is no additional information in the 4-point function. 
We will also show this explicitly, illustrating how at best the 4-point function is equiv- 
alent to the 3-point function. No other statistic such as Minkowski functionals, various 
wavelet based statistics and other esoteric constructions are worth trying to constrain 
/nl- None can he better than the 3-point function. This is true up to corrections of order 
h^A'l^ < 10-3. 

The apparent large signal to noise in the 4-point function led to the suggestion [21,22] 
that the 4-point function could even be sensitive to higher order terms in the relation 
between $ and 6a: <I>(x) = g{x.) + f^i^{g'^{x.) - {g^)) + f^^ag^{x) . . . The claim was that 
the 4-point function could constrain the real parameter a. Of course the third term is 
a minuscule correction to the first two, even for the largest allowed values of /nl- Thus 
it is difficult to understand how one could be sensitive to it. Again the missing terms in 
the variance of the estimator were responsible for the apparent sensitivity to a. Using the 
full Likelihood we will show that for any realistic experiment there is in fact not enough 
information about a in the data to constrain it, unless a > 1/(/nl^^'^^) ^ 10'^. 

Finally we will also show that for a realistic experiment where InNpix is large, the 
value of /nl for which improving the naive 3-point function estimator is important is 
rather large. One should start worrying about it once there is a many a detection of /nl- 
As a result our improved estimator will probably be only of academic interest. Our paper 
mainly provides clarification of various misconceptions in the literature. Given this and to 
reduce the length of our equations, we will work in the flat sky approximation and neglect 
the CMB transfer functions, directly working with a 2-dimensional random field with a 
local non-Gaussianity. Expressions for the 3 and 4-point functions including the radiation 
transfer function and with spherical geometry can be found for instance in [22,23]. 

We want to stres^ that our approximation captures the qualitative features of the 
real problem, like the dependence of the various expressions on the number of data A'pix. 
On the other hand one should not trust the numerical factors that we will find, because 
they would be changed in a complete treatment, where projection effects from 3 to 2 
dimensions are taken into account together with the full radiation transfer function. The 
reason why the qualitative features are captured by our approximation is that, although 
a given mode on the sky receives contribution from a range of different 3D wavelengths, 
this effect is limited to an interval Ak ~ k. This implies that a squeezed configuration 
of the 3-point function of primordial perturbations ki <^ k2,k^ (which, as we will see, 

tWe thank the unknown referees for correspondence about this point. 
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are the only configurations material to our conclusions) maps to a correlation among 
different multipoles with li -C l2,h- In other words although the 3-point function of the 
2-dimensional temperature map is not exactly of the local form, it still roughly behaves 
as (a^- a^- a^- ) oc ^?2^^ + perm., which is the 2-d analogue of eq. with additional 
modulations induced by the transfer function. As we will discuss, our results just depend 
on the dependence for /i — > (compared for instance with which is typical of other 
shapes of non-Gaussianity) , so that it is enough to stick to an exactly local non-Gaussianity 
in 2-dimensions. Also the flat sky approximation will contribute to change the numerical 
factors, but leave unaltered the qualitative features. 

We also point out that even though we do all our calculations in 2 dimensions, as 
relevant to the CMB, our conclusions are equally valid for 3 dimensional surveys, such as 
galaxy surveys or future 21 cm observations. 



2 Subtleties of the /nl expansion 

The aim of this section is to explicitly show the presence of terms in the variance of the 
3-point function estimator of /nl that are enhanced by factors of Npix and thus contribute 
significantly even though they are naively suppressed by /nl^^''^- We start by introducing 
our notation and reproducing previous calculations of the variance of the estimator. We 
then identify the terms that had been previously missed and give a rule of thumb to easily 
determine when they are important. We will show that in practice the enhanced terms 
do not correct current upper limits and that they will only become important after a very 
high signal to noise detection of /nl- 

Notation. We work in the flat sky approximation, neglect the transfer function, and 
assume that the error is dominated by cosmic variance. In this paper we are mostly inter- 
ested in the scaling properties of the estimators for the non-Gaussianities used for example 
in [16,17,20-22,24,25]. These properties are not modified by these approximation^, while 
on the other hand their use makes the presence of some physical effects much clearer, as 
we will later see. Let us briefly set up our conventions. For the Fourier transform we have: 

e 

where Q and Npix are respectively the angular size and the number of pixels of the sky 
survey. It is immediate to obtain from this the continuum limit ^j'= ^f)^~^^ ''^g — 
J cP9e~^^'^^^. We also have: 

and the useful relations: 

r e 

■''This is confirmed by the fact we are able to recover the same scaling properties found in 
[20-22,24,25] where these approximations were not used. 



6 



We are only interested in local non-Gaussianities as the effect we will discuss does not 
apply to other types. In that case, the observed field is given by a non linear function 
of a Gaussian field which is local in real space: 

= f{9e) = 9e + /nl - • (7) 

We will call this field temperature although our results apply to other measures, not just 
the CMB temperature. In Fourier space the local relation reads: 

^r= (/(5))r= 91 + /nl [{9 o 9)1- ^^^^to) ' (8) 

where we have defined {g o g)^ = Ylikdi-k^k' ^^^^ explicitly address later the case 
of possible higher order corrections in /nl to these definitions. The covariance matrix is 
defined as 

{9f,9f,)=C^,r, = Q,^6^^^_^^, (9) 
where = I ■ I, and Ci = 2ttA/1'^, which then implies that 

o , , 2ttA 1 a , , 

r 

where in the last passage we have used the continuum limit, and the fact that Npjx ~ 
Q /^ax/ (4'''")) with /max the maximum of the observed Is. From here on, in order to simplify 
the notation, we will remove the vector symbol from / and 9 in all the mathematical 
expressions when the meaning and the distinction from the modulus / = |/| and 6 = \0\ is 
clear from the context. 



Previous results: the missing enhanced terms. In [16, 17] the analysis for 
the non-Gaussianities of the local kind was performed using a trilinear estimator with 
signal-to-noise weighting. In the limit of flat sky, unit transfer function, and isotropic 
noise it reduces to: 

I ' 

where we have defined the field Xi = ° ~ ^(^'^^1,0 i and where we consider only non 
degenerate configurations with all the <I>s taken with 1^0. The normalization 

with the subscript 1 meaning that the expectation value is taken with /nl = 1, has been 
chosen so that the estimator is unbiased, (S) = /nl (jl)- The definition in eq. ([7]) tells us 
that the temperature field in the sky is to a good approximation a Gaussian field, with 

''Some of the expressions, like eq. (|12p above, when expressed in terms of Npix and a, will slightly 
depend on the geometry of the survey which changes the boundary of the domain of integration 
in Fourier space. Also we will have similar corrections going from flat to full sky. However these 
effects do not change significantly our results. 
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a small non-Gaussian correction of order /nl^ ~ /nl^^'^^ ^ 10~^. So one is tempted to 
expect that higher order corrections in /nl in the various expressions are suppressed with 
respect to the leading terms by powers of /nl^^''^ and thus irrelevant. For example, the 
variance of the estimator in eq. (Illj) starts with a piece which is of zeroth order in /nLj 
and which gives: 



where AS = £ - {£) 0. One would naively assumes that this is the dominant term in 
the variance, with small corrections of order /nl^^''^) which should contribute at most 
at order 10~^ given the current bound on /nl. This is what was assumed for example 
in [16,17,20], where the estimator (jlip was in fact found minimizing the variance at zeroth 
order in /nl among all trilinear estimators. However, as we will soon see, for the case of 
local non-Gaussianities, and only for them, there is another parameter which enters into 
the expansion: Npjx. We will see in fact that in certain expressions such as the variance 
of the estimator above for example, there are terms that although suppressed by powers 
of /nl^^^^> are enhanced by powers of Npix, and so, depending on the real value of /nlj 
they might need to be taken into account. 

In order to verify that this is actually the case, and to understand the implications of 
this fact, let us sketch the computation of the variance of the estimator £ keeping higher 
order terms in /nl- The variance of £ will involve the computation of a 6-point function, 
which will split in the sum of the product of several different combinations of connected n- 
point functions, i.e. the product of three 2-point functions, of two 3-point functions, and of 
a 4-point function and a 2-point function. Concentrating on the last kind of contribution, 
we will have terms like: 



1 1 



)c , (14) 



where {^i-^^^i^ ■ ■ ■ ^i,Jc stays for the connected n-point function. Now, apart from numer- 
ical factors, one of the terms in the expansion of the connected 4-point function reads: 



Considering the effect of this term in the variance in eq. (jl4p . where we also take the 
2-point function at zeroth order in /nL) we obtain: 

{A£^):,^Y,Cl^-^^^^J^^- (16) 

hhh 



Thus this contribution to the relative variance does not decrease as l/(Npix In Npix), as one 
would have naively expected, but there is an enhancement of Npix which make it decrease 

pix- 



only as 1/ In^ N 



^Notice that the variance of the estimator scales faster than the naive 1/Npix by a factor of 
In Npix- This behaviour is typical of non-Gaussianities of the local kind, where the signal comes 
from the correlation of the modes of all different scales. 
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There is a very physical reason for the presence of such enhanced terms for the case 
of focal non-Gaussianities. As we have already said, most of the signal for this kind of 
non-Gaussianities comes from squeezed configurations, where one of the Is is small and the 
others are large. More precisely, for the 3-point function, the signal-to-noise from all the 
squeezed configurations with the smallest / in a given decade is roughly the same for every 
decade, although there are much fewer modes in a decade of low /. This means that the 
low / modes are always very important for the estimator £. Now, the point is that there is 
an intrinsic variance associated with a configuration with a certain small I, simply because 
there are very few of those small Is, just 2/ + 1; and this is unaffected by the fact that Npix 
of the survey increases, because this just increases the /max of the experiment. Therefore 
the relative variance of the estimator due to these terms decrease only logarithmically 
with Npix, because this is how the relative importance of the configurations with small Is 
decreases with Npix- This physical explanation guarantees us that these enhanced terms 
are not present in the case of equilateral non-Gaussianities, where the importance of the 
small Is and of the squeezed configurations is marginal. 

It is useful to develop a quick thumb rule to understand if a term which is a sum 
of product of different Qs is enhanced or not: a term will be enhanced only if at least 
one Ci is raised to a power larger than one. In fact in this case the summation over the 
multipoles for this term will be dominated by the lowest Is, so that some Zmin will appear 
in the denominator. This makes these terms enhanced by powers of /max/^min (see also 
appendix lA]) . 

The relevance of the additional terms. Our discussion shows that the treatment 
of expressions containing /nl is delicate in the case of local non-Gaussianities. The expan- 
sion parameter is noi just /nl^^''^, but there are terms which can parametrically go as 
/NL^^^^Npix, and therefore cannot be neglected. We need to understand the relevance of 
these terms both for existing limits on /nl as well as their impact on future measurements. 
After a careful calculation, we find the following expression for the variance of £: 



where • • • represents terms suppressed by powers of /^l^ without any further Npix en- 
hancement. This result shows an important feature of this estimator. Imagine that we 
have a series of experiments with increasing Npix, and that at some point we detect a 
non-null /nl- Then, at first the variance will decrease as 1/ (Npix In Npix), but, after a 
critical Npix which depends on the actual value of /nl, and which is basically, apart for 
logarithms, when the signal-to- noise is of order 1, the variance will begin to decrease very 
slowly as 1/ln^ Npix, because of the enhanced variance of the term proportional to /^l. 

In the analysis performed in [16, 17] the variance for a non-zero /nl was assumed to 
be the same as for /nl = 0, expecting that the /nl corrections would have been small. In 
the light of the results of this section, we see that this procedure is not always justified. 
However, for those analysis, we can verify that the error introduced is very small, as 
already numerically checked with non-Gaussian Montecarlos in [17,23]. We can quantify 
the error is this way: the relative correction to the variance for an /nl at nao from the 
origin, where ctq is the variance computed at /nl = 0, is of order 2n^/(7r In^ Npix). For 




(17) 
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the WMAP experiment In^ Npjx ~ 35, therefore this correction is large for n larger than 
~ 6, 7. Therefore, if we wish to give a 2 — a confidence interval around a certain central 
value, we see that the enhanced terms will become important for a central value around 
4, 5 — (To far from the origin, i.e. in the case of a clear detection of a non zero /nl- In 
the analysis of [16, 17], the central value of /nl is of the order of only one ctq far from the 
origin. Because of this, the approximation done in [16, 17] of considering for a non-zero 
/nl the variance at /nl = is numerically justified, with a small error at the percent level, 
well beneath the error coming from other sources, for example from the uncertainty in the 
cosmological parameters, which gives an error on the variance of order ten percent [16]. 

Summarizing, we conclude that the enhanced terms will not be important until there 
is a clear detection of a non-zero /nl- If that happens, they will have to be taken into 
account. At that point, the variance of the estimator £ will begin to decrease as 1/ In^ Npix. 
Given the very slow convergence of £ in this regime, one is lead to wonder whether a better 
estimator exists. 

3 Likelihood Calculation 

For non-Gaussianities of the local type, it is easy to calculate the full Likelihood for /nl 
given the data and determine to what extent the data are able to constrain /nl- This is 
true even in the high signal to noise limit where the previous estimator has an increased 
variance. 

With the full Likelihood it is possible to determine what is the minimum variance 
that an estimator of /nl can have, the so called Cramer-Rao bound. The bound on the 
variance is {d'^C/df^^)~^ , where C is minus the logarithm of the Likelihood. In [20] it was 
proved that the estimator £ of the former section, whose variance scales as 1/ (Npix In Npjx) , 
satisfies this bound at order zero in /nl^^^^- However, we have just learned that this 
expansion in powers of /nl^^^^ breaks down when /nl is detected because of the presence 
of enhanced terms. It is therefore worth asking what happens to the Cramer-Rao bound 
in the same regime, and check if there are enhanced term also in this case. 

By the end of this section, we will see that the Likelihood allows for an expansion in 
powers of /nl^j without Npix enhancements, and therefore that the Cramer-Rao bound in 
the presence of a non-null /nl is affected only marginally by terms suppressed by powers of 
f^ii^A. This will tell us that the estimator £ of the previous section is just a bad estimator 
in the large signal to noise regime, and that in principle there can be estimators whose 
variance in this regime scales as l/(Npix InNpix). 

Full Likelihood and Cramer-Rao bound: leading terms. The Likehhood 
function can be simply obtained inverting eq. ([7|), and expressing the probability for the 
Gaussian variables g as a function of the temperature field 

ge = = ^e- /nl - + 2M^^e {^1 - cr^)) + ■■■ ■ (18) 

The Likelihood will be a function of the parameter /nLj while we keep /nl to denote the 
true value of the non-Gaussianity parameter. The dots represents higher order terms in 
/nl coming from the inversion of the function f{gff), which for the moment we neglect. 
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We will come back to them shortly. In Fourier space, expression (jlSp translates into: 
91 = = ^i- /nl ((^ o ^\ - a^n6i,o) + ((^ o ^. o - a^^i) + ■■■ . (19) 

Starting from minus the logarithm of the probability 

^a = lT.^hl9h9i, , (20) 

hh 

we change variable from gi to taking into account the change in the measure: 

^ = lEChl ir'i'^))h (rH<^)),, -Trln(J), (21) 

hh 

where Tr stays for trace in Fourier space, and J is the Jacobian 



d<^>h ■ 

We can now expand to second order in /nl to obtain: 



(22) 



^ = ^E(f^('^'^-'-2/NLXi^-/ + /NL(x/X-i + 4$i??-0)) (23) 

+2/nl^'J>/=o - ^MJ^Xi=o - 2/^LNpix^^ 

where we have introduced the field rji = {x° HI- Although the Likelihood contains all 
the information on the parameter /nl one can derive from an experiment, its computation 
as a function of /nl can be very challenging in practice. All analysis to date have used an 
estimator of /nl rather than to calculate the full Likelihood (e.g. [16,17]). 

The statistical properties of C depend on the underlying true value /nl- To make this 
explicit we can write 

<^i = gi + /nl {{gog)i-n a^Sifl) . (24) 
Plugging back in the expression for the Likelihood, we obtain: 

^ = Tvyr ( 919-1 + '^{hh - Inl) {xi9-i - '^hhigi^^^ (25) 



+ (/nl - /nl) ixiX-i + '^9iV-i + '^{9iV-i - {91V-1))) 
+2/nl^5/=o + (2/nl/nl - 4/^l) ^Xi=o - 2M^t<ipi^a^, 
where we have analogously defined Xi = {9 ° 9)1 ~ ^ (^"^^ifi: and fji = {xo g)i- 



"Notice that in this section, to keep the formulas as simple as possible, we have assumed that 
the average of $ in the patch of the sky survey is observed. As it will become clear later, even if 
this was not the case, it would not change relevantly the results. 
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We can use the expression of the Likehhood we have just derived to find the Cramer- 
Rao bound on the variance of an unbiased estimator for /nl: 



<jf>-'=(i:0^««-, + 2,„)_,>) (26) 
where we have used that: 

^YfQ(9iV-i) = Yl'^^^9i^^9-i-i' ^Yl9i'-i"9i" -^(t'^5i>^(^) (27) 

ll'l" '■ I 
and analogously: 

We see from eq. (fT3]l that the estimator of the last section saturates the Cramer-Rao 
bound for sufficiently small /nl- 

The expansion of the Likehhood to second order is consistent. At this 
point, one may wonder if the higher order terms in /nl might relevantly alter this result 
with terms that, though suppressed by powers of /nlj sue enhanced by factors of Npix, as 
in the former section for the variance of the estimator £. It is quite straightforward to 
check that this is not the case. For example, at quartic level in /nl there are terms like: 

fNLj2'^l'^9°9°9)i{9°9°9)-i (29) 

= /nl ^ 9l-h-l29h9l2-Q^9-i-i^-l^9l^9i^^ 
I hhhh 



whose expectation value contributes to the Cramer-Rao bound in eq. (j26p with terms like: 

fL E = fL E ^T^^ - /NL-^Np^x . (30) 

We recognize this term as not being enhanced also thanks to our thumb rule according to 
which a term is not enhanced if there are no C;s raised to a power larger than one. We 
conclude that for these terms there is no Npix enhancement, and therefore are suppressed 
by genuine powers of /nl^ with respect to the leading terms in the Cramer-Rao bound 
in eq. (j26p . Higher order terms will appear in even powers of /nL) and will give similar 
contributions of the form: 

f2n-2 Sr^ 1 Cl^Cl^ . . . Cl^^^ r2n-2 2n^ . (o, \ 

•'NL Cml^ -^NL ^ i^pix , l-sJ^j 
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so that we see there is no Npjx enhancement for all these terms. At quartic level in /nl 
there are also terms appearing from the expansion of the Jacobian in the Likelihood, as 
for example: 

/nl-^ X] 9ii9h9i39-h-i2-i3y (32) 

whose expectation value gives subleading not-enhanced terms to the Cramer-Rao bound 
of the form: ^ 

/nl-j^ X] ^hCh = /nl-j^ ( X] ) °^ /NL^^^Npix, (33) 
hh \ I / 

and the same conclusion applies unaltered to the higher order terms coming from the 
expansion of the Jacobian: 

/n^^ E C^'.C^^3---C^z.„-.=/^r^(E^y«/NrV"Np.. (34) 

We conclude that these higher order terms, being not enhanced by Npix, do not alter 
significantly the Cramer- Rao bound in eq. (j26p . 

In order to verify the consistency of the expansion at quadratic order in /nl for the 
Likelihood as well, we need to check that the higher order terms are irrelevant not only on 
average, but also on each realization. In appendix El we show that their variance scales 
at most as Npj^, which makes their contribution to the Likelihood suppressed by powers 
of /nl^^''^ with respect to the contribution of the quadratic terms. We conclude that the 
expansion of the Likelihood up to quadratic order is consistent. 

Higher order terms in the relation between $ and g are neghgible. Since 
in this section we have been very careful in keeping track of higher order terms in /nLj 
it is useful to comment on the possibility that additional contributions come from the 
presence of higher order terms in the relation between $ and the underlying Gaussian 
field of eq. ©: 

$0 = 5e + /NL(5i - f^^) + afNL9e{9l - (t^) + ■ ■ ■ (35) 

with a an unknown real parameter. Physically we expect these corrections to be there 
with a of order one. We need the Likelihood at second order in /nl, so that one might 
worry that our results now depend on a. This would be very strange as physically these 
third order terms in the expansion above are a very small correction to the, already small, 
second order terms. The data should not be sensitive at all to a. One can check that this 
is indeed what happens. We leave the details of the algebra to appendix [B1 We will find 
that, although a enters into the Likelihood at order /nlj terms containing a cancel on 
each realization up to terms suppressed by l/Npix- Therefore there is no sensitivity to a 
and it can safely be set to zero. 

In summary, we have written the Likelihood for /nl up to second order in /nl, prov- 
ing that this expansion is consistent, with the higher order terms only giving negligible 
contributions suppressed by powers of f^i^A^^'^. This has also allowed us to verify that 
higher order terms in /nl in the Likelihood do not give rise to enhanced contributions to 
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the Cramer-Rao bound. Thus we conclude that the result in eq. (j26p is only corrected by 
terms of order /nl^^^^, without Npix enhancement. After what observed in the previous 
section, this was not a priori guaranteed. 



4 No need to worry: simple estimators can satu- 
rate the Cramer-Rao bound 

Now that we have determined the Cramer-Rao bound, in this section we look for a new 
estimator which continues to have a variance close to the bound even in the high signal- 
to-noise regime. We do this to further understand the origin of the enhanced terms and 
point out how a simple change in the estimator based on our intuitive understanding can 
make the estimator saturate the bound. 

We will start from the original estimator £ in sec. [21 and we will explicitly show the 
way in which the enhanced terms cause the slow convergence of the estimator in the large 
signal-to-noise regime. After this it will become easy to guess a new estimator that, apart 
from small corrections, saturates the Cramer-Rao bound when the enhanced terms are 
large. 



Explicit origin of the increased variance. Let us therefore start from the esti- 
mator £ in eq. (|lip and express it in terms of Gaussian variables as: 



S = iTTT X] 7^ iaiX-i + /nl {xiX-i + '^giV-i)) = £0 + InlSi H (36) 



where represent higher order terms in /nl, and where we have defined: 

^1 = E ^ + '^Sm-i) ■ (38) 

I ' 

Notice that {£q£i) = 0, {£q) = 0, and (fi) = 1. Therefore the variance of the estimator 
can be written as: 

{l:^£^) = {£l) + f^^{^£l) . (39) 
As we discussed, the variance at zeroth order in /nl is 

(A£^>,„,.„ = (4> = ^ (40) 

which saturates the Cramer- Rao bound in (I26p . However, as we noted in sec. [21 the 
variance of £i behaves like: 

(^^^i) - -rL- , (41) 

vr In^ Npix 

decreasing only logarithmically. Therefore it is going to dominate the variance of the 
estimator for Npix/ In Npix ^ 1/(/nl^)- Apart for the logarithm, this is when the signal- 
to-noise becomes of order 1. 
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The enhanced variance of E\ could have been anticipated. This term contains, at 
leading order in /nl, all the signal of non-Gaussianity. We know that for the local kind 
of non-Gaussianity the signal comes from squeezed configurations with two of the three Zs 
very large, and one of the /s very small. The contribution to f i of squeezed configurations 
with the smallest of the Zs within a certain decade is roughly independent of the decade, 
although there are very few low / multipoles. The value of £i on a given realization 
depends quite strongly on the particular value of the few lowest multipoles; this explains 
why it converges to its average {£i) = 1 very slowly r*l. Progressively the contribution of 
the first decades of modes becomes negligible, so that the dependence on the particular 
value of the lowest multipoles goes away. However this happens only logarithmically in 
Npix as this is the way in which the contribution from the lowest multipoles decays. 

Improved Estimator. Now that we understand better the problem of the estimator 
£, it is easy to find an improved estimator for the large signal-to-noise regime. We can 
think about the large variance of £ as coming from a "wrong normalization". Although 
the estimator is clearly unbiased its value strongly depends on the amplitude of the low 
I modes, so that if on a particular realization we have a small amplitude in the first 
multipoles, the value of the estimator will be small and viceversa. This effect cancels on 
average (that is why the estimator is unbiased) but it is the source of the large variance. 
Anyway this effect can clearly be corrected as we surely know the amplitude of the low / 
modes in each particular realization: we just have to divide by a "realization dependent" 
normalization. We define a new estimator £ 

.S = ^^li^ = (42) 



_ El Vl~9lX-l 

- JNL + ^ 1 , , ^ ~ N + • • • - JNL + + • • ■ 

111 ci wx-i + 2giv-i) ^1 

where in the second line we have expressed everything in terms of Gaussian variables. 
Neglected terms are suppressed with respect to the ones we kept by genuine powers of 
/nl^^''^- Neglecting these terms, the new estimator £ is unbiased: {£) = /nl, as £o/£i is 
an odd function of the Gaussian variables g and it has thus zero average. 

We can now verify that the new estimator converges to the Cramer-Rao bound. We 
can write £i = I + 6£i, where 6£i is of the order of (^ff)^/^ ~ 2i/V(vr^/2 InNpix). For 
large Npix we can thus expand the denominator 

£^fNL + £o- £oS£i . (43) 

The variance introduced by the third piece scales like l/(Npix In^ Npix), niore rapidly than 
the Cramer-Rao bound oc 1/ (Npix In Npix). After a while we are therefore left with the 
variance of £q that, as we know, satisfies the Cramer-Rao bound. It is worth noticing 
that already at the level of the WMAP experiment In^ Npix — 35, so the deviation of 



**As shown in appendix [A} terms in £ of higher order in /nl, even including a possible contri- 
bution from terms proportional to a, contribute to the variance of the estimator with terms which 
are not enhanced by powers of Npix more than what Si already is. Therefore they are suppressed 
with respect to the contribution of £i by a genuine power of /nlA"'^/^. 
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this estimator from the Cramer-Rao bound is already rather small. The important point 
is that this good behavior of E is not spoiled when we enter in the large signal-to-noise 
regime. 

The improved normalization only depends on the large scales. Our un- 
derstanding of the enhanced variance of the estimator E relies on the fact that always a 
significant fraction of the signal is coming from low / modes with a great intrinsic variance. 
If this is true, it better be that the solution to this problem depends strongly on the low 
/s. Here we therefore verify that E\ can be written to good approximation in terms of just 
the first few modes. In appendix [B] we have shown that 



I ' 



^2 = o7^5'^-' (^^) 
is fully correlated with the quantity 



Si = 919-1, (45) 



I 

up to corrections 0(1/Npix), so that on each realization AS2 = 3ASi with very good 
accuracy. In the same fashion one can prove that also the quantity 

* = E(J^X«-, (46) 

is fully correlated to Si (up to corrections ©(l/Npix)) and that AS3 = 4AS'i. This implies 
that also Ei is fully correlated with Si and therefore, on each realization, we can write: 

^.^^0+^. (47) 

Now, the important point is that in order to compute the quantity A5i on a given real- 
ization one needs, to good approximation, only the first few modes. This can be seen from 
the computation of the variance of Si in Appendix [Bl 

" I " V'min 'max/ 

which shows that the contribution to the variance of the high Is is completely irrelevant. 
We therefore conclude that the value of Ei on each realization can be determined just by 
looking at the first few modes, in agreement with our intuition. 

This last remark has also relevant consequences from the computational point of view. 
In fact it seems at first sight very hard to use the new estimator in the analysis of CMB 
data, as it contains 4-point functions and one has to deal with the complications of the 
spherical geometry and of the transfer function. On the other hand the dependence on 
only the first few modes makes the modification computationally quite light. 
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Relation to the full Likelihood calculation. The Likelihood (p3]) contains all 
the information on the parameter /nl- To reconstruct it from the data we just need 
the coefficients of the terms linear and quadratic in /nl- These two combinations of the 
data are sufficient statistics for /nl- Notice that given our discussion above it is not 
so complicated to analyze the full Likelihood function: we just need the same kind of 
terms entering in the estimator £ above. A natural question is whether one can get better 
constraints on /nl using the full Likelihood function instead of an estimator. 

First of all it is straightforward to check that the maximum Likelihood estimator, 
which can be easily derived from eq. ()23p . has the same good properties of our improved 

1/2 

estimator discussed above. It is unbiased up to corrections 0(1/Np-^ ) and it asymptotically 
saturates the Cramer- Rao bound up to corrections decaying as 1/ln^Npix. 

The Cramer-Rao bound, being the average value of the second derivative of the log- 
Likelihood, gives the average value of error bars that one gets. The difference in using the 
full Likelihood is that the curvature of it changes realization by realization, as it is given by 
the /^L term in eq. (I23p . Usually this distinction between the curvature of the Likelihood 
in a particular realization and its average value is irrelevant, as the difference scales like 
l/Npix- This is not true in our case. The variance of the curvature of the Likelihood 
function only scales as 1/ln^ Npix. This is again intuitive: given the strong dependence on 
the lowest multipoles, a realization with an excess of power in the low Is compared with 
the average will be more constraining than one with suppressed power on large scales. In 
the first case in fact it is easier to see the non-Gaussian correlation between the low Is and 
the short scale power. This difference is anyway not that large: for real experiments that 
have a chance of detecting /nLj 1/ In Npix is rather small. 

We reach an important conclusion. The use of our improved estimator, or equivalently 
the maximum Likelihood one, is equivalent to the full Likelihood of the data up to small 
corrections suppressed by 1/ In Npix. This closes the door to any additional attempt to 
improve the limits on /nl- 

5 Comments on estimating /nl using the 4-point 
function 

It has been proposed in [21] and more recently in [22] that an estimator for /nl based 
on the 4-point function has a variance which decreases as l/Npj^^, so that it might be 
better than a 3-point function estimator for large enough /nl- Of course our analysis in 
the previous sections shows that the Cramer- Rao bound scales as l/(NpixlnNpix), so that 
no estimator can do better than this. Moreover we proved that a slight modification of 
the 3-point function estimator makes the new estimator £ approach asymptotically the 
Cramer-Rao bound. Rather than stop here and rely on the above "theorems" we want to 
show explicitly in this section what goes wrong in the naive calculation of the variance of 
the 4-point function estimator. We will see that the 1/Npj^ scaling of the variance does 
not hold once the signal-to-noise is larger than one and the "enhanced" terms are taken 
into account. We will also show explicitly that there is no additional information about 
/nl in the 4-point function that is not already captured by the 3-point function. 
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The variance of the proposed 4-pt estimator has enhanced terms. Let us 

begin proving that the variance of the estimator introduced in [21,22] does not scale as 
The proposed estimator, analogously to the estimator £ in the 3-point function 
case, is the linear combination of 4-point correlators which maximizes the signal-to-noise 
in the limit /nl — > 0: 

where is a normalization constant which makes the estimator unbiased (£^4) = /nl- The 
sum is restricted to momentum conserving, /4 = —h—h—hi non-degenerate quadrilaterals. 
The subscript 1 in (^j^ . . . ^i^)c,i means that the connected 4-point function is evaluated 
with /nl = 1. The variance of this estimator is: 

{ASD = (50) 



/nl • 



Here we are interested in the scaling with Npix of the different terms, therefore we do 
not keep track of the various combinatorial and numerical factors, and also of possible 
logarithmic corrections. Using the fact that the connected 4-point function behaves, at 
leading order in /nl, like: 

{^h^i2^h^h)c ~ /nl • Ci^Ci^+i^Ci, + symm , (51) 
we find that the denominator of the first term contains terms that behave like: 



2-^ n, n, n, n, n. n, ' ^ ^ 



\2 r^2 n2 r^2 n n'i 

IMS ChChCi,Ci, i^i.^Ci,Ci,Ci,Ci, Ci,Ci, 

We already met these kind of summations and we know they are dominated by I1+I2 ~ 'min 
and enhanced by a factor of Npix with respect to the naive scaling. 

2^ n n n n ^pi^ l^^i ^pix^ • (53) 

i^i, Ci,Ci,Ci,Ci, ^ 

For the numerator we have to compute the 8-point function, which in general will be the 
sum of the product of four 2-point function, of two connected 3-point functions and one 
2-point function, of two connected 4-point functions, and so on. At zeroth order in /nLj 
we have only 2-point functions and we obtain something which scales as the square root 
of the denominator. Therefore, if ones stops at this level, one finds that the variance 
decreases as 1/Npj^. This is the result obtained in [21] and [22]. 

However, as we learned in sec.[2l there are other terms in the numerator which, though 
suppressed by powers of /nl^' ^'^^ enhanced by powers of Npix. It turns out that the 
product of two 4-point functions, of the 5-point with the 3-point one, and of the 6-point 



18 



with the 2-point one, all give rise to enhanced terms with the same scaling. For example 
a term with two connected 4-point functions is 



^ Ci,Ci,Ci,Cu CrCrCrCr ^'^h^l^'^i,^iJc[^i,'i'i,VU)c 

-/nl2^2^ q,Q3 c.-q ^^^^ 

where + • • • represents terms of higher order in /nl- The sums will be dominated by the 
region with li + I2 ^ /min and li + 12 ^ Imin, so that we obtain 



/«ix(E^')'°^/NL^Xix (55) 



as we wanted to show. Putting together the behavior of the different terms we get the 
scaling of the variance of the proposed estimator up to logarithmic corrections 

(^^4) ^2^^T > (56) 

where we see the importance of enhanced terms at numerator. For comparison, it is useful 
to write the same schematic relation for the analogous 3-point function estimator £ we 
discussed in sec. [2) 



AN, 



(Af^) ~ ■ (57) 



We notice that the enhanced terms become parametrically important for both the 
estimators when f^^ANpi^ ~ 1 which is, apart for logarithms, the regime when the signal- 
to-noise is of order one. 



Is there additional information in the 4-point function? In the limit in which 
the enhanced terms are negligible, we see that the variance of 1S4 is of the order of the 
square of the variance of £, which means that £4 will give parametrically the same limits 
on /nl in this regime. However, in the previous sections, we have shown that in the same 
limit the 3-point function estimator £ saturates the Cramer-Rao bound; therefore in this 
regime there is nothing which could be added by the use of the 4-point function. This 
agrees with the numerical result of [22], where it is shown that in this regime the limit 
on /nl obtained from the 4-point function estimator is always slightly worse than the 
one obtained using the 3-point function. Given the Cramer-Rao bound, we can even say 
something more: in this regime no improvement can be achieved from combining the two 
estimators. 

On the other hand when the enhanced terms become important we see that the variance 
of both estimators does not decrease anymore (apart for logarithmic terms) . In particular 
there no 1 /Npj^ scaling for the 4-point function. As it was shown in sec. [U in this regime 
one must consider "fractional" estimators, which are not just polynomial in the data. 



19 



Explicit relation between the 3-point and 4-point estimators. It is worth 
pointing out an explicit relationship between the 4-point and 3-point function estimators, 
to show that there is really nothing new in the 4-point estimator £^4, which is not already 
taken into account using £. 

Let us remind once again that, for local non-Gaussianities, the signal-to-noise of the 
3-point function is concentrated on squeezed configurations, where one of the three Is is 
small, and the other two are large and almost opposite. The 3-point function estimator £ is 
basically doing a weighted sum of the signal contained in all the configurations, where the 
weight is the signal-to-noise ratio. Therefore, the estimator £ can be well approximated 
by a sum over just the squeezed configurations: 

L Z ' L 

where L is a large scale multipole and / is a small scale one, and /Cj;, is defined as: 

^L = E^^- (59) 

In the presence of a non-zero /nlj will contain, when expressed in terms of Gaussian 
variables a contribution 

/Cl ~ 2/nl Ql ■ (60) 

The estimator £ correlates this contribution with the long- wave mode ^-l- 

Analogously in the case of the estimator £"4, as we have seen, the signal comes from 
squeezed configurations, where the four vectors Is are approximately opposite in pairs. 
Therefore, the estimator £4^ can be written to a good approximation as a sum over just 
these configurations: 

£4 0,Y,Cl1ClK:-l ■ (61) 

L 

Again in the presence of a non-zero /nl we are correlating the non-Gaussian contribution 
inside each of the /C's, giving an average signal oc J^^. 

From this it should be clear that the two estimators are clearly not independent and 
that the 4-point one is less efficient because the non-Gaussian contribution must come out 
of both the /C's, while it is more efficient to directly correlate JCl with the mode as 
in eq. ([SH]) . 

6 Summary 

It is perhaps unfortunate that our paper is filled with so many equations, the message 
however is simple. The analysis of the local type of non-Gaussianity for scale invariant 
perturbations is somewhat more subtle than one might have guessed: a naive /nl^^''^ 
expansion is not always appropriate. The physical origin of the effect is clear: long wave- 
length modes modulate the amplitude of the short wavelengths and the amplitude of this 
modulation produced by long wavelengths of every decade in scale is the same. Cosmic 
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variance severely affects this long wavelengths and because their relative information con- 
tribution only decreases logarithmically with the number of pixels, one ends up with large 
variances for the naive /nl estimators. 

The basic point is that when one calculates the normalization of the /nl estimator one 
uses the average level of large scale fluctuations as opposed to the power in the individual 
realization one happens to have. As a result, in the limit of large signal to noise, this 
relatively large uncertainty in the normalization of the estimator severely enhances its 
variance. Fortunately one knows the amplitude of the modes in a given realization by 
direct measurement so it is almost trivial to fix the problem by choosing a normalization 
that depends on the particular realization. 

Writing down the full Likelihood one can explicitly calculate how well one should in 
principle be able to constrain /nl and explicitly check how the effect mentioned above 
comes in. One can show that a simple modification of the naive estimator recovers all the 
information that the data contain and that in fact using that estimator is basically equiv- 
alent to calculating the full Likelihood, up to corrections 0(l/lnNpix). As a result, one is 
also convinced that other statistics such as the 4-point function, Minkowski functionals, 
wavelets, etc can at best extract as much information on /nl as the 3-point function. In 
any event they would not contribute additional information on /nLj so once the 3-point 
function is measured there is nothing else to be done. 

Acknowledgments We thank Daniel Babich and Eiichiro Komatsu for useful com- 
ments. L. S. is supported in part by funds provided by the U.S. Department of Energy 
(D.O.E) under cooperative research agreement DF-FC02-94ER40818. M. Z. is supported 
by the Packard and Sloan foundations, NSF AST-0506556 and NASA NNG05GG84G. 



Appendices 

A Proof that the enhancement is at most of order 



In this appendix we want to prove that the variance of the sums that appear in the 
Likelihood and in the estimators scales at most as Np^^, i.e. that the possible enhancement 
with respect to the naive scaling is at most of order Npix. In order to do this, following the 
discussion in sec. [21 where we explained the thumb rule for discovering enhanced terms, it 
is enough to show that in the expression in Fourier space of the variance there is at most 
one Ci raised at most to the power of two. This corresponds to an enhancement of one 
factor of Npix, while further enhancements would require either one Q raised to a power 
larger than two, or more than one C; squared. 

A good rearrangement of the various terms is obtained if we start in real space, where 
the terms we are interested in can be written in the general form: 





(62) 



0102 
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where M and N are two positive integers, and Cq.q. is the covariance matrix in real space. 
This is related to the one in Fourier space by the following relation: 

Ce,0,=^Y.^ie''-^'^-'^\ (63) 
I 

Analogously, can be expressed in real space as: 

pix I ' 



Let us compute the variance of the general term in eq. (j62p . This will be the sum of 
terms of the form: 

C0^0^Cg,^g^{Ce^e^)"{Ce^e3Y{Ce^eJ^{Ce^e^f{Ce2e^)^{Ce^eiY , (65) 

ei6l26»304 

with the positive integers a, (3, 7, 6, S, p constrained to satisfy a + /3 + 7 + (5 + S + p = 
{N + M)/2. 

We can now express each of the Cs and C~^s in Fourier space with the relations ()63p 
and (j64p . After this, the summation over the angles 9i, 62,0^ and 64 becomes trivial, each 
of these giving a Kronecker delta. In particular, the summations over 9i and 62 give the 
constraints: 

li + lf + --- + C + l^i+--- + fp + lJ + --- + q = 0, (66) 

-h-lf C + 4 + --- + ll + lf + --- + li = (67) 

where li is the I associated to the Fourier transform of Cq^q^, If, with i = 1, . . . ,a, are 
the Is associated to the Fourier transform of (Cg^gj)", and analogously for the other Cs. 
These two constraints can be usefully rewritten as: 

li = -lf l^-l^, l^p-lj P^, (68) 

+ ••• + + /? + ••• + q + + ••• + + /f + ••• + /i = . (69) 

From the summation over ^3 and ^4, we obtain two analogous constraints that can be 
written as: 

l^ = lP^+...+f^ + lf + ... + ll-lP , (70) 

i^ + --- + i'^^ + ij + --- + i:i + ii + --- + is + if + --- + i^ = o , (71) 

where I2 is the I associated to the Fourier transform of C^^^. We see that the second 
of these constraints is equivalent to the one in eq. ()69p . The presence of a redundant 
constraint is a manifestation of the fact that the term we started with in eq. (j62p was 
rotationally invariant. 

After the summation over the angles, we are left with summations only over the Is: 



iCia ■ ■ ■ Cla) ■ ■ ■ I CiP ■ ■ ■ ClP 
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subject to the three independent constraints we have found. Notice that the Npjx factors 
in (j64p exactly cancel with the four ones from the sums over 6i. Now, the first and third 
constraints can be used to eliminate the summations over li and I2, leaving us with: 



(C/a • ■ ■ Qg) • • • f Qp • • • 



'1 '"'p 



'' n 



(73) 



E 

la ]a 
n ■■■'a 

with li and I2 given by eq. (j68p and (170p . The only remaining constraint is eq. (j69p . 
After applying it, the variance is in the form such that we can quickly apply our thumb 
rule, and understand its level of enhancement. Naively given that the number of Qs is 
(A^ + M) /2 — 2 and the number of summations is {N + M) /2 — 1 we get a behavior ~ Npix. 
But it can happen that the last constraint makes two Qs at numerator equal. In this case 
the sum goes as Npj^. No further enhancement is possible. 

B Higher order corrections in the definition of 
local non-Gaussianities 



In Fourier space eq. (|35|) reads: 

'^1 = 91 + hhXi + afLm ■ (74) 

In this appendix we want to prove that, although a enters in the Likelihood at order /^lj 
this does not imply that data are sensitive to this parameter, unless it is huge compared 
to the naive estimate a ~ C(l)- The new term gives the following contribution to the 
Likelihood at order Z^^- 

= af'^L [ y] - imv-i - iaiv-i)) + 3^^X/=o^ , (75) 

where we have neglected terms which are independent of /nl- We notice that both terms 
above have zero average. However this is not enough to prove that there is no relevant 
dependence on a because both terms have enhanced variance, so that they converge to 
zero very slowly. Their importance with respect to the other terms in the Likelihood 
decreases as 1/ln^Npix. What we are now going to prove is that, although both terms 
have large variance, they are strongly correlated and their contributions in eq. (j75p cancel 
up to terms suppressed by l/Npix- Therefore the dependence on a is extremely small as 
expected on physical grounds. 
Defining: 

^1 = if E 919-1 , ^2 = E ^^9iU , (76) 
we can write eq. (i75]l as: 

= af^L + 3A5i) . (77) 

We can now compute the correlation functions of Si and 5*2: 

(S^) = ^T.(9i9-i) = ^T.Ci, (78) 
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(AS!) = ^ J:{m9-Wi9j - {S,f = 5: Cf , (79) 

II I 

(^2) = 2^^Q, (80) 
I 

as computed before, and: 

II ' 

Yl (^^ Yl 9i'-l"9i" - ^'^^^i',^ j ) - (-52)' . 

The summation is dominated by those terms which come from the contraction of the gi 
in each ^2 with one of the three ^f^s contained inside the fj-i from the same 5*2 term. 
These are enhanced by a factor of Npix with respect to the other contributions. Neglecting 
subleading terms we obtain: 

{ASl) = 18^J2cf. (82) 



Finally for {AS1AS2) = {S1S2) - {Si){S2), we obtain: 

pi: 
1^ 



(A5iA52) = 6^^C2, (83) 
I 



again keeping only leading terms. We see that and ^2 are fully correlated, up to 
correction 0(1/Npix): 



On each realization we have: 



= = ' ^^^^ 

and therefore = up terms suppressed by l/Npix. 

We conclude that terms which depend on a are negligible in the Likelihood. 
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